A Reliable Randomized Algorithm for the Closest-Pair Problem
نویسندگان
چکیده
The following two computational problems are studied: Duplicate grouping: Assume that n items are given, each of which is labeled by an integer key from the set {0, . . . , U − 1}. Store the items in an array of size n such that items with the same key occupy a contiguous segment of the array. Closest pair: Assume that a multiset of n points in the d-dimensional Euclidean space is given, where d ≥ 1 is a fixed integer. Each point is represented as a d-tuple of integers in the range {0, . . . , U − 1} (or of arbitrary real numbers). Find a closest pair, i. e., a pair of points whose distance is minimal over all such pairs. In 1976 Rabin described a randomized algorithm for the closest-pair problem that takes linear expected time. As a subroutine, he used a hashing procedure whose implementation was left open. Only years later randomized hashing schemes suitable for filling this gap were developed. In this paper, we return to Rabin’s classic algorithm in order to provide a fully detailed description and analysis, thereby also extending and strengthening his result. As a preliminary step, we study randomized algorithms for the duplicate-grouping problem. In the course of solving the duplicate-grouping problem, we describe a new universal class of hash functions of independent interest. It is shown that both of the problems above can be solved by randomized algorithms that useO(n) space and finish in O(n) time with probability tending to 1 as n grows to infinity. The model of computation is a unit-cost RAM capable of generating random numbers and of performing arithmetic operations from the set {+,−, ∗,div, log2,exp2}, where div denotes integer division and log2 and exp2 are the mappings from IN to IN ∪ {0} with log2(m) = ⌊log2m⌋ and exp2(m) = 2m, for all m ∈ IN . If the operations log2 and exp2 are not available, the running time of the algorithms increases by an additive term of O(log logU). All numbers manipulated by the algorithms consist of O(log n+ logU) bits. The algorithms for both of the problems exceed the time bound O(n) or O(n+ log logU) with probability 2−n Ω(1) . Variants of the algorithms are also given that use only O(log n+ logU) random bits and have probability O(n−α) of exceeding the time bounds, where α ≥ 1 is a constant that can be chosen arbitrarily. The algorithm for the closest-pair problem also works if the coordinates of the points are arbitrary real numbers, provided that the RAM is able to perform arithmetic operations from {+,−, ∗,div} on real numbers, where adiv b now means ⌊a/b⌋. In this case, the running time is O(n) with log2 and exp2 and O(n + log log(δmax/δmin)) without them, where δmax is the maximum and δmin is the minimum distance between any two distinct input points.
منابع مشابه
Dominance Product and High-Dimensional Closest Pair under L_infty
Given a set S of n points in R, the Closest Pair problem is to find a pair of distinct points in S at minimum distance. When d is constant, there are efficient algorithms that solve this problem, and fast approximate solutions for general d. However, obtaining an exact solution in very high dimensions seems to be much less understood. We consider the high-dimensional L∞ Closest Pair problem, wh...
متن کاملSpace-efficient geometric divide-and-conquer algorithms
We develop a number of space-efficient tools including an approach to simulate divide-and-conquer space-efficiently, stably selecting and unselecting a subset from a sorted set, and computing the kth smallest element in one dimension from a multi-dimensional set that is sorted in another dimension. We then apply these tools to solve several geometric problems that have solutions using some form...
متن کاملResearch Articles A Geometric Interpretation for Local Alignment-Free Sequence Comparison
Local alignment-free sequence comparison arises in the context of identifying similar segments of sequences that may not be alignable in the traditional sense. We propose a randomized approximation algorithm that is both accurate and efficient. We show that under D2 and its important variant D 2 as the similarity measure, local alignment-free comparison between a pair of sequences can be formul...
متن کاملEfficient Randomized Incremental Algorithm For The Closest Pair Problem Using Leafary Trees
1. O(n) expected time randomized algorithms are presented in [4, 5, 7]. They use hashing and assumes the floor function as a unit cost operation. 2. An O(n log log n) time deterministic algorithm is presented in [3] which uses hashing and assumes floor function as a unit cost operation. 3. In [1] an O(n) expected time randomized algorithm using the real-RAM model is presented. This algorithm as...
متن کاملA New Algorithm for Finding Closest Pair of Vectors
Given n vectors x0, x1, . . . , xn−1 in {0, 1}m, how to find two vectors whose pairwise Hamming distance is minimum? This problem is known as the Closest Pair Problem. If these vectors are generated uniformly at random except two of them are correlated with Pearson-correlation coefficient ρ, then the problem is called the Light Bulb Problem. In this work, we propose a novel coding-based scheme ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. Algorithms
دوره 25 شماره
صفحات -
تاریخ انتشار 1997